Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bedrock): [MLOB-2181] fix bedrock token metrics #12230

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

ncybul
Copy link
Contributor

@ncybul ncybul commented Feb 5, 2025

Moves token counts for APM spans generated from Bedrock to the metrics field instead of the meta field. Also changes the name of these metrics from bedrock.usage.{prompt/completion}_tokens to bedrock.response.usage.{prompt/completion}_tokens to follow the same convention as the other integrations.

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

Yun-Kim and others added 17 commits February 4, 2025 13:12
Removes unused snapshots from now-removed v0 tests.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- Introduces the following breaking changes to tracing components:
- Removes deprecated parameters from ``Tracer.configure(...)`` method
and removes the ``Tracer.sampler`` attribute.
- Drops support for multiple tracer instances, ``ddtrace.trace.Tracer``
can not be reinitialized.
   - Removes the deprecated ``Span.sampled`` property
- Drops support for configuring sampling rules using functions and regex
in the
  ``ddtrace.tracer.sampler.rules[].choose_matcher(...)`` method
and removes the ``timestamp_ns`` parameter from
``ddtrace.internal.rate_limiter.RateLimiter.is_allowed()``.
- Drops support for configuring ``DD_TRACE_METHODS`` with the '[]'
notation. Ensure DD_TRACE_METHODS use the ':' notation instead".
- Removes the deprecated ``ddtracer`` parameter from
``ddtrace.opentracer.tracer.Tracer()``.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
… leak (#12081)

This PR adds the environment variable
``DD_AIOHTTP_CLIENT_DISABLE_STREAM_TIMING_FOR_MEM_LEAK`` to address a
potential memory leak in the aiohttp integration. When set to true, this
flag may cause streamed response span timing to be inaccurate. The flag
defaults to false.

When this was merged #7551 we
essentially added support for getting the correct timing of streamed
responses, however it also caused us to sometimes never close spans,
which lead to a memory leak. When set to true, this flag essentially
reverts that behavior.


## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
…ogging (#12193)

Removes unneeded forksafe unregister hook. The unregister is not needed,
because we never actually registered the hook to begin with.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
This PR removes integration metrics and logs from the OpenAI
integration.

Prompt-completion logs were always experimental, rarely used, and
unofficially deprecated since the release of LLM Observability. For any
customers looking to continue getting visibility into their OpenAI call
prompt/completions, we are now pushing to migrate over to use LLM
Observability instead.

Operational Integration metrics can either be replaced with LLM
Observability or APM trace metrics instead. The only caveat of using LLM
Observability is for customers who may not want prompt/completion
logging, and the only caveat of using APM trace metrics is that token
metrics will not be included, primarily only operation metrics.

[](https://datadoghq.atlassian.net/browse/MLOB-2186)

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
This PR removes integration metrics and logs from the langchain
integration.

Prompt-completion logs were always experimental, rarely used, and
unofficially deprecated since the release of LLM Observability. For any
customers looking to continue getting visibility into their langchain
call prompt/completions, we are now pushing to migrate over to use LLM
Observability instead.

Operational Integration metrics can either be replaced with LLM
Observability or APM trace metrics instead. The only caveat of using LLM
Observability is for customers who may not want prompt/completion
logging, and the only caveat of using APM trace metrics is that token
metrics will not be included, primarily only operation metrics.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
xss vulnerability for jinja2 template engine

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
IAST taint source Header wouldn't work correctly in `werkzeug>=3.1.0`

We're tainting `EnvironHeaders.__getitem__` due to get method calls to
`__getitem__`

https://github.com/pallets/werkzeug/blob/2.3.8/src/werkzeug/datastructures/headers.py
but now the `__getitem__` and `get` methods of `EnvironHeaders` are
completely differents:

https://github.com/pallets/werkzeug/blob/main/src/werkzeug/datastructures/headers.py

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
…rics (#12223)

[applying #12026 to
3.x-staging]

This PR generalizes the helper method used to extract token metrics from
an APM span to be attached to an LLMObs span. Before, Anthropic,
Bedrock, and Open AI had specific methods on each of their integration
classes to accomplish this. Now, there is one get_llmobs_metrics_tags
utils function adapted from the google-specific
get_llmobs_metrics_tags_google function that gets reused across these
integrations as well as Vertex AI and Gemini. The Langchain integration
was excluded from this change since its logic for extracting token
metrics varies significantly compared to the other integrations.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

---------

Co-authored-by: Nicole Cybul <[email protected]>
Copy link
Contributor

github-actions bot commented Feb 5, 2025

CODEOWNERS have been resolved as:

releasenotes/notes/fix-bedrock-token-location-36c21044eecce688.yaml     @DataDog/apm-python
ddtrace/_trace/trace_handlers.py                                        @DataDog/apm-sdk-api-python
ddtrace/llmobs/_integrations/utils.py                                   @DataDog/ml-observability
tests/contrib/botocore/test_bedrock_llmobs.py                           @DataDog/ml-observability
tests/snapshots/tests.contrib.botocore.test_bedrock.test_ai21_invoke.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_amazon_embedding.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_amazon_invoke.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_amazon_invoke_stream.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_anthropic_invoke.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_anthropic_invoke_stream.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_anthropic_message_invoke.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_anthropic_message_invoke_stream.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_cohere_embedding.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_cohere_invoke_multi_output.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_cohere_invoke_single_output.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_cohere_invoke_stream_multi_output.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_cohere_invoke_stream_single_output.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_invoke_model_using_aws_arn_model_id.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_meta_invoke.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_meta_invoke_stream.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_read_error.json  @DataDog/apm-python
tests/snapshots/tests.contrib.botocore.test_bedrock.test_readlines_error.json  @DataDog/apm-python

@datadog-dd-trace-py-rkomorn
Copy link

datadog-dd-trace-py-rkomorn bot commented Feb 5, 2025

Datadog Report

Branch report: nicole-cybul/fix-bedrock-token-metrics2
Commit report: 61d8724
Test service: dd-trace-py

✅ 0 Failed, 130 Passed, 1184 Skipped, 3m 23.05s Total duration (25m 12.74s time saved)

@pr-commenter
Copy link

pr-commenter bot commented Feb 5, 2025

Benchmarks

Benchmark execution time: 2025-02-05 18:56:09

Comparing candidate commit 61d8724 in PR branch nicole-cybul/fix-bedrock-token-metrics2 with baseline commit b17990b in branch 3.x-staging.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 326 metrics, 2 unstable metrics.

@erikayasuda erikayasuda changed the base branch from 3.x-staging to main February 6, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants